4. Map Management Module

4.1. Loop Closure

Our sliding window and marginalization scheme bound the computation complexity, but it also introduces accumulated drifts for the whole system. To eliminate drifts, we implement a relocalization process which identifies places that have already been visited before. Feature-level connections between loop closure candidates and the current frame are established continuously. These feature correspondences are tightly integrated in the Namespace pose_graph module, resulting a drift-free state estimate system.

Multiple observations of multiple features are directly used for relocalization, together with neighborhood of 10 keyframes, result in higher accuracy and better smoothness. A graphical illustration of the relocalization procedure is shown in Fig. 4.1.

../../_images/Relocalization.png — Fig. 4.1 Diagram illusrating the relocalization and pose graph optimization procedure. (1) The relocalization starts with visual pose estimates (blue) and recorded historical states (green). If a loop is detected for the newest keyframe, as shown by the red lines, a relocalization occured. (2) Thanks to the use of feature-level correspondences, we are able to incorporate loop-closure constraints from multiple past keyframes. Then a keyframe is added to the pose graph when it is marginalized out from the sliding window. (3) If there is a loop between this keyframe and any other past keyframes, the loop closure constraints, formulated as 6-DOF relative rigid body transforms, will be added to the pose graph. The pose graph is optimized using all relative pose constraints in a standalone thread.

4.1.1. Loop Detection

We utilize DBoW2 1, a popular place recognition library, for loop detection. We use fast feature detection and BRIEF descriptor to describe them. The additional corner features are used to achieve better recall rate on loop detection. The descriptors are treated as visual word to query visual database. In this way we do not need to keep the raw image to save resources.

4.1.2. Feature Retrieval

When a loop is detected, the connection between the local sliding window and the loop-closure candidate is established by retrieving feature correspondences. They are found by the BRIEF descriptor matching, namely hamming distance calculation (Class KeyFrame:HammingDis()) and selection of the minimal distance 2. Descriptor matching may cause some wrong mathes, so we use additional step to reject outliers. We use 3D-2D PnP test with RANSAC, with ratio test rate 0.90 (Class KeyFrame:PnPRANSAC() ). Based on known 3D position of features in the local sliding window, and 2D observations in the loop closure candidate image, we perform PnP test. After outlier rejection, we keep the survivers the right loop detection and perform relocalization.

4.2. Global Pose Graph Map Merging

Additional pose graph optimization step is running separately to ensure the collection of the past estimates are registered into a globally consitent configuration.

4.2.1. Adding Keyframes into the Pose Graph

Keyframes are added into the pose graph after the VO estimator. Every keyframe serves as a vertex in the pose graph, and it connects with other vetexes by two types of edges.

Sequential Edge:
A keyframe established several sequential edges to its previous keyframes. A sequential edge represents the relative transforms between two keyframes. Considering keyframe \(i\) and one of its previous keyframes \(j\), the sequential edge contains relative position \(\hat{\mathbf{P}}_{ij}^i\) and orientation (rotation matrix) \(\hat{\mathbf{R}}_{ij}^i\).

(4.1)\[\begin{split}\begin{align} \hat{\mathbf{R}}_{ij}^i &= {\hat{\mathbf{R}}_i^w}^{-1} \hat{\mathbf{R}}_j^w \\ \hat{\mathbf{P}}_{ij}^i &= {\hat{\mathbf{R}}_i^w}^{-1} \left( \hat{\mathbf{P}}_j^w - \hat{\mathbf{R}}_i^w \right) \end{align}\end{split}\]

Loop-Closure Edge:
If the keyframe has a loop connection, it connects the target frame by a loop-closure edge in the pose graph. The value of the loop-closure edge is obtained using results from relocalization.

4.2.2. 6-DOF Pose Graph optimization

The whole graph of sequential edges and loop closure edges are optimized by minimizing the cost function defined by the set of sequential edges and loop-closure edges.

Note

we add Huber norm \(\rho(\cdot)\) for loop-closure edges to further reduce the impact of possible wrong loops. While we do not use any robust norms for sequential ones as these edges are extracted from VO, which already contains sufficient outlier rejection mechanisms.

The pose graph optimization and relocalization run asynchronously in two threads, which enables immediate use of the most optimized pose graph for relocalization whenever it becomes available.

Pose Graph Merging Without Global Reference
The pose graph not only can optimize the current map, but merge the current map with a previous-built map as well. If we have loaded a previous-built map and have detected loop connections, we can merge them together. Since all edges are relative constraints, the pose graph optimization automatically merges them. As shown in Fig. 4.2, the current map is pulled into the prior map by loop edges. Every vertex and edge are relative variables, therefore, we only need to fix the first vertex in the pose graph.

../../_images/map_merging.png — Fig. 4.2 Illustration of map merging. The yellow figure is the prior map and the blue figure is current map. They are merged based on loop connectioins.

Pose Graph Merging With GPS Initial Information
With GPS Initial information to align with the initial visual esitmation, things will be easier. We can directly intialize with the camera coordinate and body coordinate using the GPS pose at that time.

4.3. Pose Graph Map Load & Save

We save the pose graph, namely the vertexes and edges as well as descriptors of every keyframe. Raw images are discarded to reduce consumption. To be more specific, the sates we save for \(i\) th keyframe are

(4.2)\[\begin{equation} \left[ i, \hat{\mathbf{P}}_i^w, \hat{\mathbf{q}}_i^w, l, \hat{\mathbf{P}}_{il}^i, \hat{\mathbf{q}}_{il}^i, \mathrm{F}(u, v, \mathrm{des}) \right] \end{equation}\]

where \(i\) is the frame index, and \(\hat{\mathbf{P}}_i^w\) and \(\hat{\mathbf{q}}_i^w\) are position and orientation (unit quaternion), respectively, acquired from VO estimator. If this frame has a loop closure frame, \(l\) is the loop-closure frame’s index; \(\hat{\mathbf{P}}_{il}^i\) and \(\hat{\mathbf{q}}_{il}^i\) are the relative translation and rotation (unit quaternion) between these two frames, which is obtained from relocalization. \(\mathrm{F}(u, v, \mathrm{des})\) is the feature set, and each feature contains 2D location and its BRIEF descriptor.

Additionally, we use the same saving serialization format to load keyframes. The loop edge is established directly by the loop information \(\hat{\mathbf{P}}_{il}^i\) and \(\hat{\mathbf{q}}_{il}^i\). Eery keyframe establishes several sequential edges with its neighbor keyframes. After loading the pose graph, we perform global 6-DOF pose graph once immediately. As we use Cereal Serialization library, the loading and saving time is quite short.

1: An open source C++ library for indexing and converting images into a bag-of-word representation. Please visit: https://github.com/dorian3d/DBoW2.
2: Nicosevici, T., & Garcia, R. (2012). Automatic visual bag-of-words for online robot navigation and mapping. IEEE Transactions on Robotics, 28(4), 886-898.